Apparatus and method for verification of system interconnect upon hot-plugging of electronic field replaceable units

ABSTRACT

A field replaceable unit has a connector with first and second interconnect apparatus coupled to connector. The field replaceable unit has test apparatus coupled to the first and second interconnect apparatus capable of testing connections through the connector to the first interconnect apparatus under control of signals on the second interconnect apparatus. The field replaceable unit is capable of being hot-plugged. In an embodiment, the second interconnect apparatus is of the JTAG type. Also claimed is a method of testing interconnect between the field replaceable unit and another unit of a system into which it has been hot-plugged.

FIELD OF THE APPLICATION

[0001] The application relates to the field of self-test of electronicsystems, including computer systems, having hot-pluggablefield-replaceable units. In particular, the application relates tomethods for verifying functionality and correct connection of high-speedinterconnect apparatus of the field replaceable units. Disclosedembodiments relate specifically to self-test of hot-pluggable fieldreplaceable units in high performance and high reliability computingsystems.

BACKGROUND OF THE APPLICATION

[0002] Field Replaceable Units

[0003] Many electronic systems, including most computer systems, containmultiple field replaceable units (FRUs). FRUs generally include anyportion of an electronic system that is designed to be replaced withoutrequiring transport of the entire system to a repair facility. FRUsinclude input/output cards and processor modules, including PCI buscards, of computer systems. FRUs also include channel interface cards oftelephone switching and other communications systems.

[0004] As with anything else built by man, electronic circuitry canfail. Electronic systems, including computers, are often repaired byreplacing one or more FRUs. FRUs may also be added to a system, orexchanged with others in a system, to reconfigure or expand the systemto meet particular system requirements.

[0005] Hot Plugging

[0006] It is often undesirable to completely shut down an electronicsystem for maintenance, even when maintenance requires replacement of,or addition of, one or more FRUs. For example, it is undesirable to shutdown a telephone switching machine serving ten thousand customers sothat a trunk interface card can be replaced. Similarly, it isundesirable to shut down an entire airline reservation-tracking computersystem for minor repairs and reconfiguration. Many electroniccommunications and computing systems therefore allow hot-plugging (alsoknown as hot-socketing) of FRUs to minimize the need for systemshutdowns during repair and reconfiguration.

[0007] An example hot-pluggable FRU is a PCMCIA expansion card such asare commonly used with notebook computers. PCMCIA cards have a connectorsupporting moderately high-speed digital interconnect in the form of aparallel digital bus, as well as power, control, and reset connections.

[0008] High Speed Interconnect

[0009] Many FRUs of modern communications and computing systems haveconnectors supporting one or more high-speed digital interconnectsystems. These high speed interconnect systems typically involve one ormore parallel busses, such as the PCI or PCMCIA busses, allowing fortwo, three or more connections. Many other bus types are also known.High speed interconnect may also be point-to-point interconnect havingtwo connections.

[0010] FRUs may incorporate processors and/or memory. They may alsoincorporate input-output (IO) devices such as network interfaces, diskdrives, disk drive controllers, display and keyboard adapters, powersupplies, and many other components of communications and computingsystems.

[0011] Designs are known for systems wherein at least some FRUs can beexchanged while other components of the system continue operation. Forexample, many RAID (Redundant Array of Independent Disks) array systemsprovide for replacement of failed drives and reconstruction of datasetswithout requiring system shutdown. These systems often providemechanisms for sequencing power and reset connections to an FRU. Thesedesigns also often provide mechanisms for self testing each FRU after itis inserted into a system.

[0012] JTAG

[0013] The IEEE 1149.1 serial bus, also known as the JTAG bus, wasdevised for testing of inactive FRUs by providing access from a testerto circuitry within the FRU. In particular, the JTAG bus providedability to perform a boundary scan on each integrated circuit on an FRU.The tester can verify connectivity of the integrated circuits of an FRUand verify that they are installed correctly. The JTAG bus provides forinterconnection of one or more integrated circuits in a chain, any ofwhich may be addressed by the tester. Typically, multiple devices of acircuit board are interconnected into a JTAG chain.

[0014] The JTAG bus uses four wires. These include a serial data-inline, a serial data-out line, a clock line, and a test mode select line.Typically the data-out line of a first chip in a chain couples indaisy-chain configuration to the data-in line of a second chip of thechain, and the data-out line of the second chip couples to the data-inline of a third; the data-out line of the last chip in the chain isbrought back to the test connector.

[0015] The IEEE 1152 bus is a newer, enhanced, version of the 1149.1JTAG bus. References herein to a JTAG bus are intended to include boththe 1149.1 and 1152 variations.

[0016] The JTAG bus is most often used for testing an FRU in a factoryenvironment, typically when these FRU's are inserted into FRU testapparatus for production testing. For purposes of this application, theterm system excludes FRU test apparatus as used in production testing;the term system includes computer systems where FRUs operate to runoperating system and user programs.

[0017] Installation of FRUs

[0018] When FRUs are inserted into a system, it is possible that somewires of connectors may make proper contact with circuitry of the FRUwhile other wires may not couple correctly—they may be resistive orremain open. This is particularly likely if the connectors are dirty, orif circuit boards of the system and FRU flex during insertion. If theconnections coupling the FRU to other parts of the system can be testedfor resistive and open wires, an installer could repair the installationby cleaning the connectors and reseating the FRU.

[0019] Newly installed FRUs may also have cold solder joints orelectrostatic discharge (ESD) damage that can also impair communicationsover connections coupling the FRU to other parts of the system. Whilecold solder joints and ESD damage can not be repaired by cleaningconnectors, it is desirable to identify FRUs having these faults andavoid using them in systems.

[0020] In modern high performance systems, error correcting coding (ECC)may be used on some high speed interconnect, including high speedinterconnect crossing connections between an FRU and remaining parts ofthe system. ECC can, however, mask the effect of resistive or open wiresof connectors coupling an FRU to remaining parts of the system. Thismasking occurs because the ECC makes the system appear to work correctlyeven with resistive or open wires. It is desirable to identify resistiveand open wires of connectors protected by ECC since resistive and openwires can cause other faults, normally correctable through ECC, to beuncorrectable; thereby degrading system reliability

[0021] It is therefore desirable to test connections between an FRU andremaining parts of a system upon installation or replacement of an FRU.

SUMMARY OF THE APPLICATION

[0022] An FRU having high speed interconnect is equipped with atest-access path. In a particular embodiment the test-access path is aJTAG-compliant scan path.

[0023] Upon insertion of an FRU into the system, power and reset signalsare applied to the FRU. A processor of the system then uses thetest-access path of the FRU to test high-speed interconnect paths acrossconnectors coupling the FRU to the system. In a particular embodiment,the high-speed interconnect are protected by ECC; ECC syndrome lines aretested separately and interconnect data lines are tested with ECCdisabled.

[0024] Any problems detected with the high-speed interconnect paths arereported to the installer. The installer may then correct the problem byre-seating the FRU in its connectors, or replacing the FRU.

[0025] Once the high-speed interconnect has been tested, reset signalsapplied to the FRU are released.

[0026] In a particular embodiment, the processor of the system that usesthe test-access path is a system management processor of the system.

[0027] In a particular embodiment, a high-speed interconnect stimulatoris provided for testing the high speed interconnect and its connectionto the newly inserted FRU. In an alternative embodiment, a scan path ofa second FRU already installed in the system is used to test the highspeed interconnect and its connection to the newly inserted FRU.

BRIEF DESCRIPTION OF THE DRAWINGS

[0028]FIG. 1 is a block diagram of a computing system having multipleFRUs.

[0029]FIG. 2 is a block diagram of a generic FRU inserted in a connectorof a system, showing test circuitry of the FRU and system.

[0030]FIG. 3 is a flowchart illustrating insertion of, and testinginterconnect paths coupled to, an FRU.

[0031]FIG. 4 is a block diagram of an alternative embodiment of a newlyinserted generic FRU in a connector of a system, where a scan path andhigh-speed interconnect interface of an FRU already installed in thesystem is used for testing newly inserted FRUs.

[0032]FIG. 5 is a flowchart illustrating additional steps associatedwith insertion of, and testing interconnect paths coupled to, a daughterFRU.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0033] A computer system 100 such as is illustrated in FIG. 1 has atleast one processor-memory FRUs 102, 104, interconnected by high-speedinterconnect 106. High-speed interconnect 106 is also connected to oneor more network interface FRUs 108, 110, one or more disk interface FRUs112, and a console FRU 114. There is also a system management processor116 intended to perform system management functions while not executingproduction software. Disk interface FRUs 112 are coupled to one or moredisk drive FRUs 118.

[0034] System management processor 116 is coupled through a testinterconnect 120 to the processor-memory FRUs 102, 104, networkinterface FRUs 108, 110, disk interface FRUs 112, and console FRU 114.In a particular embodiment, test interconnect 120 incorporates JTAG scanchains. System management processor 116 is also coupled through aninterconnect stimulator 122 to high speed interconnect 106

[0035] In normal operation, the processor/memory FRUs 102, 104,communicate with each other, the network interface FRUs 108, 110, diskinterface FRU 112, and console FRU 114, over the high speed interconnect106.

[0036]FIG. 2 illustrates a generic FRU 202, which may be aprocessor/memory FRU 102, 104, network interface FRU 108, 110, diskinterface FRU 112, a console FRU 114, or another FRU of system 100capable of connecting to high-speed interconnect 106 and for which hotplug capability is desired.

[0037] Generic FRU 202 has a connector 204 whereby it may be attached toa mating connector 206 or 207 of system 100. In a particular embodiment,connector 204 is an edge connector, in another embodiment connector 204is a multiple-pin PCMCIA connector. It is anticipated that connector 204may be of additional types. In the particular embodiment, connector 204is designed such that, as the FRU 202 is inserted into the matingconnector 206, power, ground, and reset lines of connector 204 couple tocorresponding wires of the mating connector before high speedinterconnect 106 lines of connector 204.

[0038] Generic FRU 202 has a test interconnect interface in the form ofJTAG slave interface 208, controlled by JTAG signals 210 of testinterconnect 120. These JTAG signals 210 are brought to connector 204such that JTAG slave interface 208 is capable of coupling to testinterconnect 120 through the mating connector 206.

[0039] Generic FRU 202 has high-speed interconnect interface 209 coupledto JTAG slave interface 208. During normal operation, high speedinterconnect interface 209 provides apparatus for remaining circuitry211 of the FRU to communicate over high speed interconnect 106. Thehigh-speed interconnect interface 209 incorporates test apparatus suchthat JTAG slave interface 208 is capable of reading signals received byhigh-speed interconnect interface 209 from high speed interconnect 106,and of causing high-speed interconnect interface 209 to arbitrate forand place signals on high speed interconnect 106.

[0040] The system management processor 116 has a multiple-channel JTAGmaster 220 such that each mating connector 206 of the system is coupledto a separate channel of the JTAG master 220. System managementprocessor 116 also has a stimulator 222 capable of placing predeterminedpatterns of signals on high speed interconnect 106.

[0041] When it is desired to replace an old FRU, which may be adefective or obsolete FRU of system 100, such as processor/memory FRU104 or network interface FRU 110, the FRU is rendered quiescent 302(FIG. 3) through commands entered on system console 114. In theparticular embodiment, rendering the FRU quiescent is done withoutshutting down system 100. The old FRU is then removed 304 from matingconnector 206 of system 100.

[0042] Next, a new FRU, which may be a replacement, an upgraded, or anadditional FRU, is inserted 306 such that its connector 204 engages withmating connector 206 of system 100. The new FRU is held quiescent whilean FRU-insertion signal is generated 308. The system managementprocessor 116 then interrogates the FRU to identify 309 the FRUs type.

[0043] The system management processor 116, acting through high speedinterconnect stimulator 222, then arbitrates for high-speed interconnect106 and places 310 known patterns thereon. When placing known patterns310 on high-speed interconnect 106, ECC features are disabled so thatall lines may be tested. The system management processor then uses testinterconnect 120 to read 312 the high speed interconnect interface 204of the FRU 202 and verify correct receipt of the known patterns. Thissequence verifies that the FRU is capable of receiving patterns from thehigh-speed interconnect correctly.

[0044] Next, system management processor 116 uses test interconnect 120to cause 314 the high speed interconnect interface 204 of FRU 202 toarbitrate for, and place known patterns on, high speed interconnect 106.The system management processor 116 then reads 316 the known patternsfrom the high speed interconnect 106 and verifies that they are correct.This sequence verifies that the FRU can transmit patterns correctly onthe high speed interconnect. If any error is detected during reading ofpatterns 312 or verifying patterns 316, an error message is generated320; otherwise the FRU is started 321 by releasing its reset signals.

[0045] Should an error have been detected and an error message generated320, an installer may reseat 322 the FRU in the mating connector 206. Ifthis is done, the high-speed interconnect to the FRU is retested 324 byrepeating the steps of holding the FRU quiescent 308, identifying theFRU type 309, placing known patterns 310 on the interconnect, readingand verifying 312 the patterns, transmitting 314 patterns from the FRU,and verifying 316 the patterns. If the retest passes, the FRU is startedby releasing its reset signals, if not the installer may replace 326 theFRU.

[0046] In an alternative embodiment, illustrated in FIG. 4, the highspeed interconnect stimulator 222 of the embodiment illustrated in FIG.2 is not needed. In a system 400 of this embodiment, system managementprocessor 402 communicates with a JTAG master 404, and a first FRU 406is installed in a mating connector 408 in the system 400.

[0047] When a new FRU 410, which may be a replacement, an upgraded, oran additional FRU, is inserted 306 into a mating connector 412 of thesystem such that its connector 414 engages with mating connector 412.The new FRU is held quiescent 308 while an FRU-insertion signal isgenerated 308. System management processor 402 then interrogates the FRUto identify 309 the FRUs type.

[0048] The system management processor 402 then selects an FRU 406already present in the system 400 and capable of communicating withnewly installed FRU 410. There may, but need not, be additional FRUs inadditional mating connectors 413 in the system; these additional FRUsmay but need not be capable of communicating over the same high speedinterconnect 420 as that used for communications between the alreadypresent FRU 406 and the newly installed FRU 410. System managementprocessor 402 then communicates with a JTAG slave 416 of FRU 406 toinstruct high speed interconnect interface 418 of FRU 406 to brieflyinterrupt its operation by arbitrating for, and placing 310 knownpatterns on, high speed interconnect 420. As when placing known patterns310 on high-speed interconnect 420, ECC features are disabled so thatall lines may be tested. The system management processor then uses JTAGslave 422 of the newly inserted FRU 410 to read 312 the high speedinterconnect interface 424 of FRU 410 and verify correct receipt of theknown patterns. This sequence verifies that the FRU is capable ofreceiving patterns from the high-speed interconnect correctly.

[0049] Next, system management processor 402 uses JTAG master 404 tocommunicate through JTAG slave 422 to the high speed interconnectinterface 424 of FRU 410. Management processor 402 commands high speedinterconnect interface 424 to arbitrate for, and place known patternson, high speed interconnect 420. These known patterns are addressed to,and received by, high speed interconnect interface 418 of the earlierinstalled FRU 406. The system management processor 402 then reads 316,through JTAG slave 416 and JTAG master 404, the known patterns from thehigh speed interconnect interface 418 of the earlier installed FRU 406and verifies that they are correct. This sequence verifies that the FRUcan transmit patterns correctly on the high speed interconnect.

[0050] If any error is detected during reading of patterns 312 orverifying patterns 316, an error message is generated 320; otherwise theFRU is started 321 by releasing its reset signals.

[0051] It is anticipated that the sequence of verifying that the newlyinserted FRU 410 is capable of receiving known patterns correctly(310-312) and transmitting known patterns correctly (314-316) can bereversed without departing from the spirit of the invention. In analternative embodiment, correct transmission is verified before correctreception is verified.

[0052] The method is applicable to point-to-point high-speedinterconnect as well as to multidrop bussing. The method is alsoapplicable to FRUs, such as FRU 410, that have daughter FRUs, such asdaughter FRU 440. When an FRU 410 having a daughter FRU 440 is insertedinto the system, the system management processor 402 identifies 309 andtests 309-316 the ability of FRU 410 to communicate with other parts ofthe system 400 as heretofore described. Should testing fail, errormessages are generated 320 as heretofore described. Should testingsucceed, testing 500 (FIG. 5) of FRU 410 to daughter FRU 440communication is performed before the FRU is started 321.

[0053] In an embodiment, testing 500 (FIG. 5) of FRU 410 to daughter FRU440 communication is performed by system management processor 402through a slave system management processor (SMP) 442 on FRU 410, whichcommunicates with a JTAG master 444 on FRU 410. In an alternativeembodiment, system management processor 402 communicates directly withJTAG master 444.

[0054] Under control of the system management processor 402, the SMPinstructs 504 FRU 410's daughter-connector high speed interconnectinterface 446 to place known patterns on high speed interconnect 450.High speed interconnect 450 is that used during normal operation forcommunications between FRU 410 and daughter FRU 440. The SMP then uses aJTAG slave port 448 of a high-speed interconnect interface 452 to readand verify 506 the known patterns as received by the high-speedinterconnect interface 452 on the daughter FRU 440 side of the daughterFRU connector 454.

[0055] Under control of the system management processor 402, the SMP 442then causes 508 daughter FRU 440's high speed interconnect interface 452to place known patterns on high speed interconnect 450. The SMP thenuses high-speed interconnect interface 446 to read and verify 510 theknown patterns as received on the FRU 410 side of connector 454.

[0056] Should any error be detected during the either step of read andverify 506, 510, appropriate error messages are generated 512. If noerror is detected, operation of both daughter FRU 440 and FRU 410 isstarted 514 by releasing their reset signals.

[0057] While the forgoing has been particularly shown and described withreference to particular embodiments thereof, it will be understood bythose skilled in the art that various other changes in the form anddetails may be made without departing from the spirit and hereof. It isto be understood that various changes may be made in adapting thedescription to different embodiments without departing from the broaderconcepts disclosed herein and comprehended by the claims that foll

What is claimed is:
 1. A field replaceable unit comprising: at least oneconnector; a first interconnect interface coupled to the at least oneconnector; a test interconnect interface coupled to the at least oneconnector; and test apparatus associated with the first interconnectinterface coupled to the first and test interconnect interfaces capableof testing connections through the at least one connector 204 to thefirst interconnect apparatus 209 under control of signals from thesecond 208 interconnect apparatus; wherein the field replaceable unit202 is capable of being hot-plugged, and wherein the field replaceableunit is capable of testing connectivity in a system from at least oneunit 222 of the system through the at least one connector 204 to thefirst interconnect apparatus
 209. 2. The field replaceable unit of claim1, wherein the second interconnect apparatus is of the JTAG type.
 3. Asystem comprising: a first and a second field replaceable units, whereeach field replaceable unit comprises: at least one connector, a firstinterconnect apparatus coupled to the at least one connector, a secondinterconnect apparatus coupled to the at least one connector, and testapparatus coupled to the first and second interconnect apparatus capableof testing connections through the at least one connector to the firstinterconnect apparatus under control of signals on the secondinterconnect apparatus; and a system management processor; wherein thefirst interconnect apparatus of the first field replaceable unit iscoupled to the first interconnect apparatus of the second fieldreplaceable unit; wherein the system management processor is coupled tothe test apparatus of the first field replaceable unit; wherein thesystem management processor is capable of instructing the first fieldreplaceable unit to cease operation, thereby permitting replacement ofthe first field replaceable unit; wherein the system managementprocessor is capable of using the test apparatus of the first fieldreplaceable unit to test connectivity through the first interconnectapparatus of the first field replaceable unit to the first interconnectapparatus of the second field replaceable unit.
 4. The system of claim3, wherein the second interconnect apparatus of the first fieldreplaceable unit is JTAG-compatible.
 5. The system of claim 3, whereinthe first interconnect apparatus of the first field replaceable unitincludes error correction coding capability and wherein the systemmanagement processor is capable of disabling said error correction whenusing the test apparatus of the first field replaceable unit to testconnectivity through the first interconnect apparatus of the first fieldreplaceable unit to the first interconnect apparatus of the second fieldreplaceable unit.
 6. A method of testing interconnect between a firstfield replaceable unit and a second unit of a system comprising thesteps of: inserting the first field replaceable unit into a connector ofthe system, the first field replaceable unit having a high speedinterconnect interface and a test interconnect; detecting insertion ofthe first field replaceable unit; verifying an ability of the firstfield replaceable unit to receive signals from a second unit of thesystem through its high speed interconnect interface; and verifying anability of the first field replaceable unit to transmit signals to thesecond unit of the system through its high speed interconnect interface.7. The method of claim 6, wherein the steps of verifying an ability ofthe first field replaceable unit to receive signals from the second unitof the system through its high speed interconnect interface andverifying an ability of the first field replaceable unit to transmitsignals to the second unit of the system are performed through a JTAGinterface to the high speed interconnect interface of the first fieldreplaceable unit.
 8. The method of claim 6, wherein the steps ofverifying an ability of the first field replaceable unit to receivesignals from the second unit of the system through its high speedinterconnect interface and verifying an ability of the first fieldreplaceable unit to transmit signals to the second unit of the systemare performed by a management processor through a JTAG interface to thehigh speed interconnect interface of the first field replaceable unitand through a JTAG interface to a high speed interconnect interface ofthe second field replaceable unit.
 9. The method of claim 8, whereinerror correction coding of the high speed interconnect interface of thefield replaceable unit is disabled during the step of verifying anability of the first field replaceable unit to transmit signals to asecond unit of the system.
 10. The method of claim 6, further comprisingthe steps of: inserting a daughter field replaceable unit into aconnector of the first field replaceable unit, the daughter fieldreplaceable unit having a high speed interconnect interface and a testinterconnect; verifying an ability of the first field replaceable unitto receive signals from the daughter field replaceable unit; andverifying an ability of the first field replaceable unit to transmitsignals to the daughter field replaceable unit.
 11. The method of claim10, wherein the steps of verifying an ability of the first fieldreplaceable unit to receive signals from a second unit of the system andverifying an ability of the first field replaceable unit to transmitsignals to the second unit of the system are performed under control ofa management processor of the system.
 12. The method of claim 6, whereinthe step of verifying an ability of the first field replaceable unit toreceive signals from a second unit of the system through its high speedinterconnect interface further comprises the step of having the secondunit of the system arbitrate for a bus, the bus being coupled to thehigh speed interconnect interface of the first and second fieldreplaceable units.